-
Notifications
You must be signed in to change notification settings - Fork 649
Disable database triggers during import #1996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sgrif (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
I have not figured out how I can test this code change. What I did test was a hand edited Hand edited BEGIN;
-- Disable triggers on each table.
ALTER TABLE "categories" DISABLE TRIGGER ALL;
ALTER TABLE "crates" DISABLE TRIGGER ALL;
ALTER TABLE "keywords" DISABLE TRIGGER ALL;
ALTER TABLE "metadata" DISABLE TRIGGER ALL;
ALTER TABLE "reserved_crate_names" DISABLE TRIGGER ALL;
ALTER TABLE "teams" DISABLE TRIGGER ALL;
ALTER TABLE "users" DISABLE TRIGGER ALL;
ALTER TABLE "badges" DISABLE TRIGGER ALL;
ALTER TABLE "crates_categories" DISABLE TRIGGER ALL;
ALTER TABLE "crates_keywords" DISABLE TRIGGER ALL;
ALTER TABLE "crate_owners" DISABLE TRIGGER ALL;
ALTER TABLE "versions" DISABLE TRIGGER ALL;
ALTER TABLE "dependencies" DISABLE TRIGGER ALL;
ALTER TABLE "version_authors" DISABLE TRIGGER ALL;
ALTER TABLE "version_downloads" DISABLE TRIGGER ALL;
-- Set defaults for non-nullable columns not included in the dump.
ALTER TABLE "users" ALTER COLUMN "gh_access_token" SET DEFAULT '';
-- Truncate all tables.
TRUNCATE "categories" RESTART IDENTITY CASCADE;
TRUNCATE "crates" RESTART IDENTITY CASCADE;
TRUNCATE "keywords" RESTART IDENTITY CASCADE;
TRUNCATE "metadata" RESTART IDENTITY CASCADE;
TRUNCATE "reserved_crate_names" RESTART IDENTITY CASCADE;
TRUNCATE "teams" RESTART IDENTITY CASCADE;
TRUNCATE "users" RESTART IDENTITY CASCADE;
TRUNCATE "badges" RESTART IDENTITY CASCADE;
TRUNCATE "crates_categories" RESTART IDENTITY CASCADE;
TRUNCATE "crates_keywords" RESTART IDENTITY CASCADE;
TRUNCATE "crate_owners" RESTART IDENTITY CASCADE;
TRUNCATE "versions" RESTART IDENTITY CASCADE;
TRUNCATE "dependencies" RESTART IDENTITY CASCADE;
TRUNCATE "version_authors" RESTART IDENTITY CASCADE;
TRUNCATE "version_downloads" RESTART IDENTITY CASCADE;
-- Import the CSV data.
\copy "categories" ("category", "crates_cnt", "created_at", "description", "id", "path", "slug") FROM 'data/categories.csv' WITH CSV HEADER
\copy "crates" ("created_at", "description", "documentation", "downloads", "homepage", "id", "max_upload_size", "name", "readme", "repository", "textsearchable_index_col", "updated_at") FROM 'data/crates.csv' WITH CSV HEADER
\copy "keywords" ("crates_cnt", "created_at", "id", "keyword") FROM 'data/keywords.csv' WITH CSV HEADER
\copy "metadata" ("total_downloads") FROM 'data/metadata.csv' WITH CSV HEADER
\copy "reserved_crate_names" ("name") FROM 'data/reserved_crate_names.csv' WITH CSV HEADER
\copy "teams" ("avatar", "github_id", "id", "login", "name") FROM 'data/teams.csv' WITH CSV HEADER
\copy "users" ("gh_avatar", "gh_id", "gh_login", "id", "name") FROM 'data/users.csv' WITH CSV HEADER
\copy "badges" ("attributes", "badge_type", "crate_id") FROM 'data/badges.csv' WITH CSV HEADER
\copy "crates_categories" ("category_id", "crate_id") FROM 'data/crates_categories.csv' WITH CSV HEADER
\copy "crates_keywords" ("crate_id", "keyword_id") FROM 'data/crates_keywords.csv' WITH CSV HEADER
\copy "crate_owners" ("crate_id", "created_at", "owner_id", "owner_kind") FROM 'data/crate_owners.csv' WITH CSV HEADER
\copy "versions" ("crate_id", "crate_size", "created_at", "downloads", "features", "id", "license", "num", "published_by", "updated_at", "yanked") FROM 'data/versions.csv' WITH CSV HEADER
\copy "dependencies" ("crate_id", "default_features", "features", "id", "kind", "optional", "req", "target", "version_id") FROM 'data/dependencies.csv' WITH CSV HEADER
\copy "version_authors" ("id", "name", "version_id") FROM 'data/version_authors.csv' WITH CSV HEADER
\copy "version_downloads" ("date", "downloads", "version_id") FROM 'data/version_downloads.csv' WITH CSV HEADER
-- Drop the defaults again.
ALTER TABLE "users" ALTER COLUMN "gh_access_token" DROP DEFAULT;
-- Reenable triggers on each table.
ALTER TABLE "categories" ENABLE TRIGGER ALL;
ALTER TABLE "crates" ENABLE TRIGGER ALL;
ALTER TABLE "keywords" ENABLE TRIGGER ALL;
ALTER TABLE "metadata" ENABLE TRIGGER ALL;
ALTER TABLE "reserved_crate_names" ENABLE TRIGGER ALL;
ALTER TABLE "teams" ENABLE TRIGGER ALL;
ALTER TABLE "users" ENABLE TRIGGER ALL;
ALTER TABLE "badges" ENABLE TRIGGER ALL;
ALTER TABLE "crates_categories" ENABLE TRIGGER ALL;
ALTER TABLE "crates_keywords" ENABLE TRIGGER ALL;
ALTER TABLE "crate_owners" ENABLE TRIGGER ALL;
ALTER TABLE "versions" ENABLE TRIGGER ALL;
ALTER TABLE "dependencies" ENABLE TRIGGER ALL;
ALTER TABLE "version_authors" ENABLE TRIGGER ALL;
ALTER TABLE "version_downloads" ENABLE TRIGGER ALL;
COMMIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! I also noticed this problem a few days ago, but so far didn't even manage to file an issue. :)
You can test this on your local instance by running
cargo run --bin enqueue-job dump_db
while the background worker is running. The dump will end up in the ./local-uploads
directory.
It looks like I cannot run the background worker on macOS? $ cargo run --bin background-worker
Finished dev [unoptimized + debuginfo] target(s) in 0.28s
Running `target/debug/background-worker`
<jemalloc>: option background_thread currently supports pthread only
memory allocation of 40 bytes failedAbort trap: 6 The README for jemallocator seems to agree that |
The crates.io backend is only meant to work on Linux (since this is how we run it in production). I'm not sure any of the components work when ran natively on Mac. There's a docker-compose config file available, which should make getting started with running in Docker straight-forward. There's also a bit of documentation for this in |
@bruceadams you can temporarily remove the "background_threads" feature
from jemalloc in Cargo.toml, or rebase on master now that #1956 has landed.
…On Thu, Dec 19, 2019, 09:00 Sven Marnach ***@***.***> wrote:
The crates.io backend is only meant to work on Linux (since this is how
we run it in production). I'm not sure any of the components work when ran
natively on Mac.
There's a docker-compose config file available, which should make getting
started with running in Docker straight-forward. There's also a bit of
documentation for this in docs/CONTRIBUTING.md in this repo.,
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1996?email_source=notifications&email_token=AAAFNKQXUP4IEQGJ63TFHT3QZN47ZA5CNFSM4J4ZL2E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHJWBFI#issuecomment-567500949>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAFNKW4PIJLOYIC34ACDA3QZN47ZANCNFSM4J4ZL2EQ>
.
|
I develop on macOS so I'm invested in making sure it works ;) It usually does :) |
After rebasing on master (which I haven't pushed up to this PR) 🤷 $ cargo run --bin background-worker
Finished dev [unoptimized + debuginfo] target(s) in 1.31s
Running `target/debug/background-worker`
Booting runner
Using local uploader, crate files will be in the local_uploads directory
Cloning index
thread 'main' panicked at 'Failed to clone index: Error { code: -1, klass: 2, message: "failed to resolve path \'file://./tmp/index-bare\': No such file or directory" }', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
$ ls -al tmp/index-bare
.rw-r--r-- 111 bruce 19 Dec 21:09 config
.rw-r--r-- 73 bruce 19 Dec 21:09 description
.rw-r--r-- 23 bruce 19 Dec 21:09 HEAD
drwxr-xr-x - bruce 19 Dec 21:09 hooks
drwxr-xr-x - bruce 19 Dec 21:09 info
drwxr-xr-x - bruce 19 Dec 21:09 objects
drwxr-xr-x - bruce 19 Dec 21:09 refs |
I did (that's what I was trying to show by listing $ script/init-local-index.sh
tmp/index-bare already exists, exiting
$ # Huh. Already there. Delete it and recreate...
$ rm -rf tmp
$ script/init-local-index.sh
Initializing repository in tmp/index-bare...
Creating checkout in tmp/index-bare...
Your local git index is ready to go!
Please refer to https://github.com/rust-lang/crates.io/blob/master/README.md for more info!
$ cargo run --bin background-worker
Finished dev [unoptimized + debuginfo] target(s) in 0.82s
Running `target/debug/background-worker`
Booting runner
Using local uploader, crate files will be in the local_uploads directory
Cloning index
thread 'main' panicked at 'Failed to clone index: Error { code: -1, klass: 2, message: "failed to resolve path \'file://./tmp/index-bare\': No such file or directory" }', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace. |
- This allows `updated_at` field values to be loaded during import. Before this change, `updated_at` was set to when the import ran, overwriting the values being loaded. - Also, the import completes in about one fifth the time (on my slow MacBook Air).
b6664ce
to
7415074
Compare
@bruceadams Sorry, I obviously didn't read your previous comment carefully. But I indeed remember seeing this issue before, and it even is on my "bugs to file" list on my other laptop… Apparently the reletaive
Feel free to file a bug about this if you have time – we at least need to document this, and preferably find a way to use a relative path to make it Just Work. |
Oops! My earlier comment sounded impatient or irritated. Sorry! I really appreciate your engaging with me on this, especially when my available time is so choppy. I’ll work to communicate my appreciation and joy more clearly in the future. It worked! 🎊 $ export GIT_REPO_URL=file:///Users/bruce/github/crates.io/tmp/index-bare
$ cargo run --bin background-worker
Finished dev [unoptimized + debuginfo] target(s) in 0.30s
Running `target/debug/background-worker`
Booting runner
Using local uploader, crate files will be in the local_uploads directory
Cloning index
Runner booted, running jobs
Database dump uploaded to db-dump.tar.gz. Thank you! And, the generated import.sql exactly matches my hand edited one. I done all the testing I can think of for this. (I also rebased this PR onto master as of the past hour or so.) Is there anything else I can do to help this PR along? I, personally, am in no hurry to get this merged, since I have a simple workaround for the problem being fixed here. |
No worries, you did not sound irritated, and I did not take any offence. I just apologised for unnecessarily increasing the number of communication round trips. :) There is nothing else to do to move this along. Based on your testing results, I will simply merge this. :) @bors r+ |
📌 Commit 7415074 has been approved by |
Disable database triggers during import - This allows `updated_at` field values to be loaded during import. Before this change, `updated_at` was set to when the import ran, overwriting the values being loaded. - Also, the import completes in about one fifth the time (on my slow MacBook Air).
☀️ Test successful - checks-travis |
I want your bugs!! |
Sure, sure, you will get them! Several of them have actually been posted in the last few days, but I'll take a look this evening what's still remaining on my list. |
updated_at
field values to be loaded during import.Before this change,
updated_at
was set to when the import ran,overwriting the values being loaded.
(on my slow MacBook Air).